Skip to content

HTML API: Preserve raw text contents in serialize#54

Open
sirreal wants to merge 2 commits into
trunkfrom
fix/iframe-noembed-noframes-serialize
Open

HTML API: Preserve raw text contents in serialize#54
sirreal wants to merge 2 commits into
trunkfrom
fix/iframe-noembed-noframes-serialize

Conversation

@sirreal

@sirreal sirreal commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

  • Preserve parser-derived raw-text contents when serializing IFRAME, NOEMBED, and NOFRAMES.

Root cause

WP_HTML_Processor::serialize_token() recognizes these elements as self-contained raw-text elements, but explicitly replaced their modifiable text with an empty string before appending the serialized closing tag. As a result, normalize() and full-document serialize() dropped parsed contents such as <iframe>x</iframe> to <iframe></iframe>.

Behavioral impact

Serialization now preserves the contents of IFRAME, NOEMBED, and NOFRAMES literally, including markup-like text, character references, and NUL-byte replacement. Coverage includes fragment normalization and full-document serialization, including NOFRAMES in HEAD and FRAMESET contexts.

Why literal emission is correct

These contents come from the parser's raw-text handling. Character references are not decoded there, and NUL bytes have already followed the parser/tokenizer replacement behavior exposed by get_modifiable_text(). Escaping or dropping the text changes the parsed document; emitting it literally preserves serialize/re-parse fidelity for the parser-derived contents. This serializer behavior is not a sanitizer and does not broaden set_modifiable_text(), which continues to reject edits to these elements.

Validation

  • WP_TESTS_SKIP_INSTALL=1 ./vendor/bin/phpunit --group html-api --filter Tests_HtmlApi_WpHtmlProcessor_Serialize - OK (101 tests, 174 assertions)
  • vendor/bin/phpcs --standard=phpcs.xml.dist src/wp-includes/html-api/class-wp-html-processor.php tests/phpunit/tests/html-api/wpHtmlProcessor-serialize.php - OK

See #65372.

The serializer was discarding the raw-text contents of IFRAME, NOEMBED, and NOFRAMES even though get_modifiable_text() already returns the browser-equivalent raw text for those elements.

Let those elements follow the same raw emission path as SCRIPT and STYLE, preserving contents while retaining existing NUL and newline normalization.

See #65372.
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant